{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "7H3yTncQfoym"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "h1CiDh7CfqON"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "pp-UaomMQJNo"
      },
      "source": [
        "# A neural network from scratch\n",
        "\n",
        "_Notebook orignially contributed by: [am1tyadav](https://github.com/am1tyadav)_\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/examples/blob/master/community/en/nn_from_scratch.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/examples/tree/master/community/en/nn_from_scratch.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "NBhbug0Qgmuu"
      },
      "source": [
        "## Overview\n",
        "In this notebook, we will create a Neural Network model from scratch to solve a Multi-Class Classification problem using TensorFlow. We are going to use the popular MNIST dataset (Grayscale images of hand-written digits from 0 to 9). If you already have some idea on how to create and train models using a Keras but want to dive a bit into the lower level workings of Neural Networks, then this notebook would, hopefully, be useful to you."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "zK2SZnszhSYk"
      },
      "source": [
        "## Setup\n",
        "Let's start by importing the libraries and functions that we will need. The MNIST dataset is easily accessible from Keras. We will use the `to_categorical` helper function from Keras utilities to convert the labels to one-hot encoded representations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "S1sepk9uMddm"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "import tensorflow as tf\n",
        "assert tf.__version__.startswith('2')\n",
        "\n",
        "from tensorflow.keras.datasets import mnist\n",
        "from tensorflow.keras.utils import to_categorical\n",
        "\n",
        "print('TensorFlow version:', tf.__version__)\n",
        "print('Is Executing Eagerly?', tf.executing_eagerly())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CH3QqWQ1RiP2"
      },
      "source": [
        "## Define a neural network class\n",
        "\n",
        "Next, let's define a class called `NNModel` for our neural network model. This class will encompass functionality typical to neural networks like forward propagation, computing costs and so on. After this class is ready, we will be able to use an instance of this class to simply call a method which will train the model on the training set and then return predictions on a given test set. When the class is initialized, we can pass on a list of number of nodes for the various layers including for the input and the output layers. All the layers are going to be densely connected. If, for example, we have 4 dimensional feature vectors, 3 output classes and want to use 2 hidden layers with 8 nodes each, we will pass on a list of number of nodes when instantiating the class like this:\n",
        "```\n",
        "layers = [4, 8, 8, 3]\n",
        "model = NNModel(layers)\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "m3za66lwMjWX"
      },
      "outputs": [],
      "source": [
        "class NNModel():\n",
        "  def __init__(self, layers):\n",
        "    self.costs = [] # for storing in-training costs\n",
        "    self.W = [] # for storing trainable weights\n",
        "    self.layers = layers # a list of layers: each item is number of nodes\n",
        "    self.L = len(layers) # total number of layers including input \u0026 output"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "za_UE4kZnKo7"
      },
      "source": [
        "## Initialize parameters\n",
        "\n",
        "We will use `tf.random.normal` to get random values from normal distribution for the weights of our network. We are not using any biases in this example, just the weights."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "ejidHNJnie7h"
      },
      "outputs": [],
      "source": [
        "zeros_init = tf.zeros_initializer()\n",
        "\n",
        "class NNModel(NNModel):\n",
        "  def initialize_params(self):\n",
        "    for layer in range(1, self.L):\n",
        "      self.W.append(tf.Variable(tf.random.normal([self.layers[layer],\n",
        "                                                  self.layers[layer-1]])))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "X9NBnWhSnvn4"
      },
      "source": [
        "## Forward propagation\n",
        "\n",
        "To perform one step of forward propagation, we will define the function given below. One step means computing on one single batch of training examples. First, we compute the linear outputs followed by computation of the activation outputs. In this model, we are only going to use the _relu_ activation function. Note that the output layer does not need to have this activation since a probability distribution can be automatically calculate when using the `categorical_crossentropy` loss later. When the computations for all the layers are done, the final linear output is returned by this function."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "Y84daISunw_X"
      },
      "outputs": [],
      "source": [
        "class NNModel(NNModel):\n",
        "  def forward_prop(self, x_batch):\n",
        "    A = []\n",
        "    Z = []\n",
        "    # compute linear and activation outputs for all the nodes\n",
        "    A.append(tf.transpose(a=x_batch))\n",
        "    for layer in range(1, self.L):\n",
        "      Z.append(tf.matmul(self.W[layer-1], A[layer-1]))\n",
        "      if layer != self.L - 1: # No activation is applied to the output layer\n",
        "        A.append(tf.nn.relu(Z[layer-1]))\n",
        "    return tf.transpose(a=Z[self.L-2])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "KVjT6C2SqmH0"
      },
      "source": [
        "## Make predictions\n",
        "\n",
        "We will write a function to calculate predictions for a batch of examples."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "KNSJqjY7qnCe"
      },
      "outputs": [],
      "source": [
        "class NNModel(NNModel):\n",
        "  def predict(self, x_batch):\n",
        "    Z = self.forward_prop(x_batch)\n",
        "    return tf.argmax(input=Z, axis=1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8c4bNlGLrd5T"
      },
      "source": [
        "## The training loop\n",
        "\n",
        "Finally, we will write a function to run the training loop. This function will take arguments for the training set, the test set, number of epochs and batch size. We will first initialize the weights, then create use the _Adam Optimizer_ algorithm provided in TensorFlow.\n",
        "\n",
        "The training loop itself uses a `GradientTape` context for each batch. In this context, we will calculate costs for all the batches calling the `categorical_crossentropy` loss function given in TensorFlow which can take a logits tensor if the `from_logits` argument is set to True. Then we will use the context to calculate gradients and update the weights."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "K8Sh1UaQrc5D"
      },
      "outputs": [],
      "source": [
        "class NNModel(NNModel):\n",
        "  def train(self, x_train, y_train, x_test, y_test, epochs=10, batch_size=128):\n",
        "    self.initialize_params()\n",
        "    m = x_train.shape[0]\n",
        "    \n",
        "    optimizer = tf.optimizers.Adam()\n",
        "    \n",
        "    for epoch in range(epochs):\n",
        "      epoch_cost = 0\n",
        "      for batch in range(int(m/batch_size)):\n",
        "        x_batch = x_train[(batch*batch_size):(batch*batch_size+batch_size)]\n",
        "        y_batch = y_train[(batch*batch_size):(batch*batch_size+batch_size)]\n",
        "        \n",
        "        # Compute the cost for this batch within the GradientTape context\n",
        "        with tf.GradientTape() as tape:\n",
        "          Z = self.forward_prop(x_batch)\n",
        "          batch_loss = tf.losses.categorical_crossentropy(y_batch, Z,\n",
        "                                                          from_logits=True)\n",
        "          batch_cost = tf.reduce_mean(batch_loss)\n",
        "        # Use the GradientTape context to automatically compute gradients\n",
        "        grads = tape.gradient(batch_cost, self.W)\n",
        "        optimizer.apply_gradients(zip(grads, self.W))\n",
        "        \n",
        "        epoch_cost += batch_cost\n",
        "      \n",
        "      self.costs.append(epoch_cost.numpy())  \n",
        "      print('Epoch {}/{}. Cost: {:.2f}'.format(epoch+1, epochs,\n",
        "                                               epoch_cost.numpy()))\n",
        "    \n",
        "    preds = self.predict(x_test)\n",
        "    return preds"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8eCxbz9ZSwjr"
      },
      "source": [
        "## Process the dataset\n",
        "\n",
        "Now that the `NNModel` class is complete, let's import the dataset. We will also convert the labels to their one-hot encoded representations, normalize the pixel values for all examples (You can also try normalizing by first subtracting the mean from all pixel values and then dividing by the total range of values but simply dividing by the total range also works well). Finally, we will reshape the examples on both the sets to unroll then from 28 by 28 arrays to 784 dimensional vectors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "eAN1ONp6MvI7"
      },
      "outputs": [],
      "source": [
        "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
        "\n",
        "x_test_orig = x_test\n",
        "\n",
        "y_train = to_categorical(y_train)\n",
        "y_test = to_categorical(y_test)\n",
        "\n",
        "x_train = x_train / 255.\n",
        "x_test = x_test / 255.\n",
        "\n",
        "x_train = np.float32(x_train)\n",
        "x_train = np.reshape(x_train, (60000, 784))\n",
        "x_test = np.float32(x_test)\n",
        "x_test = np.reshape(x_test, (10000, 784))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "268qdJGGTULP"
      },
      "source": [
        "## Train the model\n",
        "\n",
        "Now all we have to do is instantiate a model, pass in a list of layers with the number of units we want for each layer and call the `train()` method. We will need to pass the training set and the test set. The model will return predictions on the test set after the training is complete."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "t8nw7mdKMxvb"
      },
      "outputs": [],
      "source": [
        "model = NNModel([784, 128, 128, 10])\n",
        "\n",
        "preds = model.train(x_train, y_train, x_test, y_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "VioflTOhTnwl"
      },
      "source": [
        "## Plot the in-training performance\n",
        "\n",
        "Let's take a look at how the model performed during training in a couple of plots."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "azgmhikhM0EE"
      },
      "outputs": [],
      "source": [
        "plt.plot(range(10), model.costs)\n",
        "plt.xlabel('Epochs')\n",
        "plt.ylabel('Cost')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "JccKWvNYTtUW"
      },
      "source": [
        "## Predictions\n",
        "\n",
        "Finally, we will look at some of the predictions. The wrong predictions are labeled in red color."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "-JwpigAlM2dF"
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(10,10))\n",
        "\n",
        "for i in range(25):\n",
        "  plt.subplot(5, 5, i + 1)\n",
        "  plt.imshow(x_test_orig[i], cmap='binary')\n",
        "  plt.xticks([])\n",
        "  plt.yticks([])\n",
        "  pred = np.squeeze(preds[i])\n",
        "  label = np.argmax(y_test[i])\n",
        "  if pred == label:\n",
        "    col = 'g'\n",
        "  else:\n",
        "    col = 'r'\n",
        "  plt.xlabel('Pred: {} Label: {}'.format(pred, label), color=col)\n",
        "\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "HLZ7C8akgGDF"
      },
      "source": [
        "You should get pretty decent predictions - for me, only 1 out of 25 was a wrong prediction. That may be different for you, of course, but you should get largely correct predictions."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "nn_from_scratch.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true,
      "version": "0.3.2"
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
